NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning

Luo, M; Xue, Z; Dimakis, A; Grauman, K (June 2025, CVPR 2025)

Egocentric and exocentric perspectives of human action differ significantly, yet overcoming this extreme viewpoint gap is critical in augmented reality and robotics. We propose VIEWPOINTROSETTA, an approach that unlocks large-scale unpaired ego and exo video data to learn clip-level viewpoint-invariant video representations. Our framework introduces (1) a diffusion-based Rosetta Stone Translator (RST), which, leveraging a moderate amount of synchronized multi-view videos, serves as a translator in feature space to decipher the alignment between unpaired ego and exo data, and (2) a dual encoder that aligns unpaired data representations through contrastive learning with RST-based synthetic feature augmentation and soft alignment. To evaluate the learned features in a standardized setting, we construct a new cross-view benchmark using Ego-Exo4D, covering cross-view retrieval, action recognition, and skill assessment tasks. Our framework demonstrates superior cross-view understanding compared to previous view-invariant learning and ego video representation learning approaches, and opens the door to bringing vast amounts of traditional third-person video to bear on the more nascent first-person setting.
more » « less
Full Text Available
Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models

Raoof, N; Rout, L; Daras, G; Sanghavi, S; Caramanis, C; Shakkottai, S; Dimakis, A (March 2025, ICLR 2025)

In pretraining data detection, the goal is to detect whether a given sentence is in the dataset used for training a Large Language Model LLM). Recent methods (such as Min-K % and Min-K%++) reveal that most training corpora are likely contaminated with both sensitive content and evaluation benchmarks, leading to inflated test set performance. These methods sometimes fail to detect samples from the pretraining data, primarily because they depend on statistics composed of causal token likelihoods. We introduce Infilling Score, a new test-statistic based on non-causal token likelihoods. Infilling Score can be computed for autoregressive models without re-training using Bayes rule. A naive application of Bayes rule scales linearly with the vocabulary size. However, we propose a ratio test-statistic whose computation is invariant to vocabulary size. Empirically, our method achieves a significant accuracy gain over state-of-the-art methods including Min-K%, and Min-K%++ on the WikiMIA benchmark across seven models with different parameter sizes. Further, we achieve higher AUC compared to reference-free methods on the challenging MIMIR benchmark. Finally, we create a benchmark dataset consisting of recent data sources published after the release of Llama-3; this benchmark provides a statistical baseline to indicate potential corpora used for Llama-3 training.
more » « less
Full Text Available
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos

Luo, M; Xue, Z; Dimakis, A; Grauman, K (March 2024, https://doi.org/10.48550/arXiv.2403.06351)

We investigate exocentric-to-egocentric cross-view translation, which aims to generate a first-person (egocentric) view of an actor based on a video recording that captures the actor from a third-person (exocentric) perspective. To this end, we propose a generative framework called Exo2Ego that decouples the translation process into two stages: high-level structure transformation, which explicitly encourages cross-view correspondence between exocentric and egocentric views, and a diffusion-based pixel-level hallucination, which incorporates a hand layout prior to enhance the fidelity of the generated egocentric view. To pave the way for future advancements in this field, we curate a comprehensive exo-to-ego cross-view translation benchmark. It consists of a diverse collection of synchronized ego-exo tabletop activity video pairs sourced from three public datasets: H2O, Aria Pilot, and Assembly101. The experimental results validate that Exo2Ego delivers photorealistic video results with clear hand manipulation details and outperforms several baselines in terms of both synthesis quality and generalization ability to new actions.
more » « less
Full Text Available
Provable Lipschitz Certification for Generative Models

Jordan, M; Dimakis, A (July 2021, International Conference on Machine Learning (ICML))
null (Ed.)
Full Text Available
Solving Inverse Problems with a Flow-based Noise Model

Whang, J; Lei, Q; Dimakis, A (July 2021, International Conference on Machine Learning (ICML),)
null (Ed.)
Full Text Available
Composing Normalizing Flows for Inverse Problems

Jay Whang, J; Lindgren, E; Dimakis, A (July 2021, International Conference on Machine Learning (ICML))
null (Ed.)
Full Text Available
Intermediate Layer Optimization for Inverse Problems using Deep Generative Models

Daras, G; Dean, J; Jalal, A; Dimakis, A (July 2021, International Conference on Machine Learning (ICML))
null (Ed.)
Full Text Available
Instance-Optimal Compressed Sensing via Posterior Sampling

Jalal, A; Karmalkar, S; Dimakis, A; Price, E (July 2021, International Conference on Machine Learning (ICML))
null (Ed.)
Full Text Available
Fairness for Image Generation with Uncertain Sensitive Attributes

Jalal, A; Karmalkar, S; Hoffmann, J; Dimakis, A; Price, E (July 2021, International Conference on Machine Learning (ICML))
null (Ed.)
Full Text Available
SMYRF: Efficient Attention using Asymmetric Clustering

Daras, G; Kitaev, N; Odena, A; Dimakis, A (October 2020, Advances in neural information processing systems)
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records